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We show that the interplay between excluded volume eftects, hydrophobicity, and hydrogen bond- 
ing of a tube-like representation of a polypeptide chain gives rise to free energy landscapes that ex- 
hibit a small number of metastable minima corresponding to common structural motifs observed in 
proteins. The complexity of the landscape increases only moderately with the length of the chain. 
Analysis of the temperature dependence of these landscapes reveals that the stability of specific 
metastable states is maximal at a temperature close to the mid-point of folding. These mestastable 
states are therefore likely to be of particular significance in determining the generic tendency of 
proteins to aggregate into potentially pathogenic agents. 
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The mechanism by which proteins fold reliably into 
their native states is frequently described by using the 
concept of a free energy landscape (FEL) [H, [3, H, I3 • Un- 
der physiological conditions, the region of configuration 
space associated with the native state has the lowest free 
energy and is therefore thermodynamically the most sta- 
ble. In addition to the native and fully unfolded states, 
intermediate structures have been detected in the fold- 
ing and misfolding processes of many proteins [1, @, 0, [1] ■ 
These metastable states can play an important role in the 
folding process, or be kinetic traps that interfere with cor- 
rect folding. The experimental characterization of these 
states remains challenging, as they are often transient or 
disordered, but significant progress in this direction has 
recently been made by combining experiment with theory 
[1) S • Since specific metastable states can increase 



the probability of misfolding and aggregation , it 



is important to understand the mechanism of their for- 
mation. Indeed, a global characterization of the FEL 
is crucial to a full understanding of the relationship be- 
tween the native and metastable states, and the interplay 
between folding and misfolding. 

A complete characterization of the FEL by compu- 
tational methods requires that the free energies of the 
native and all non-native structures be calculated. Two 
problems arise in this exercise: the technical issue of sam- 
pling a vast number of configurations separated by a va- 
riety of free energy barriers, and the conceptual difficulty 
of choosing an appropriate set of coordinates to describe 
the FEL 13]. Sampling is typically performed in the 
vicinity of an ensemble of folding or unfolding pathways, 
and the resulting free energy is then projected on to a 
subspace defined by one or two order parameters Q . For 
the resulting surfaces to provide insight into thermody- 
namics and dynamics, it is crucial that the chosen order 
parameters are able to detect the relevant details of the 
FEL. In cases where a long dynamic trajectory that ex- 
plores a large region of configuration space is available. 



it is possible to mitigate the order parameter problem 
by using the dynamics to cluster the configurations and 
disconnectivity graphs to represent the results [14i|. To 
maintain the dynamic approach but improve the extent of 
sampling, methods have been developed that start from a 
survey of local minima on the potential energy landscape 
[isl . A related approach has been used in which potential 
energy minima are grouped together by a dynamic cri- 
terion based on an approximate rate theory before their 
combined free energies are calculated [l6i |. 

The level of detail in which a FEL can be computed 
depends on the choice of protein model. For fully atom- 
istic representations, analysis has been restricted mostly 
to peptides and small proteins 0, 0]. Coarse-grained 
descriptions allow larger regions of the conformational 
space to be explored. In the most tractable models pro- 
teins are confined to a lattice 17|. A great deal of valu- 



able insight has been gained from this approach, but 
there are limits to how realistic it can be made. A promis- 
ing new model has recently been proposed whose distinc- 
tive feature is that the protein backbone is assigned a 
finite thickness to account in an effective way for the vol- 
ume occupied by the amino acid side chains [3, [l^ IIO] ■ 
The interactions considered include directional hydrogen 
bonding (with well depth chb), a local bending stiffness 
(defined by an energy penalty es), and pairwise attrac- 
tive hydrophobic forces (with energy ew). The protein is 
thus regarded as a semi-fiexible tube whose radial sym- 
metry is broken by the restraints imposed by the hydro- 
gen bonds. The excluded volume of the tube makes this 
model significantly different from other off-lattice coarse- 
grained models such as beads-on-strings, and also from 
Go models because it includes no explicit energetic bias 
towards a predetermined structure. 

The zero-temperature phase diagram of a 24-residue 
homopolymer was characterised by using this approach 
as a function of hydrophobic strength and stiffness, each 
measured relative to the hydrogen bonding strength [l8| . 
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The resulting set of ground states, similar to those an- 
notating the figures in this Letter, show a remarkable 
resemblance to the structures deposited in the protein 
data bank 2l|, indicating that the model is capable of 



(a) 



capturing the range of folds available to a real polypep- 
tide chain. The directionality of the hydrogen bonding, 
together with the excluded volume of the tube (or sec- 
ondary structure requirements, as shown recently in a re- 
lated model [13]) confine the energetically favorable con- 
formations of a polypeptide chain to a small subset of 
those that would exist in the absence of such restrictions. 

In this Letter, we look beyond the global minima of 
the potential energy surfaces, and characterize the entire 
FEL of the tube model at finite temperatures. By repeat- 
ing the calculations for different values of the parameters 
we examine the effect of hydrophobicity and stiffness on 
both native and metastable states. We also determine 
how the relationship between the various structures on 
the FEL changes with temperature, which has important 
implications for understanding the competition between 
folding, misfolding and aggregation. 

Projecting a FEL on to any set of order parameters 
runs the risk of concealing or distorting features of the 
folding process. To introduce as little prejudgement as 
possible, we start by plotting a transition disconnectiv- 
ity graph (TRDG) [IJ, ll5| in which configurations along 
a continuous ergodic trajectory, generated by crankshaft 
and pivot Monte Carlo steps, are clustered according to 
their all-atom root-mean-square displacement (RMSD) 
using a threshold of 2.2A. The trajectory is mapped onto 
a graph of nodes, each representing a cluster of configu- 
rations, and edges weighted by the number of transitions 
between pairs of clusters. The partition function Z of a 
cluster is proportional to its residence time in the simula- 
tion, while that of a transition state between any pair of 
nodes is given by the minimum cut that separates them 
on the graph [IJ, |l5|. In both cases, the partition func- 
tion is related to the free energy by F/kT = —\nZ. The 
TRDG is constructed from these free energies by placing 
nodes on a vertical free energy scale and connecting them 
at a point according to the free energy of the transition 
state that connects them. 

For stiffness es = 0.4 and hydrophobicity ew — 0.05 at 
temperature T* — 0.161 (where all quantities are spec- 
ified in units of chb), a simulation was run for a suffi- 
ciently long time to observe spontaneous folding and un- 
folding about 100 times. The resulting TRDG is shown 
in Fig. [Ija). Most of the branches from the main stem 
are vertically short, indicating low barriers and thus a 
smooth surface. The main feature is a well-defined stem 
corresponding to a funnel leading to the a-helical native 
state. Three smaller metastable funnels are also present, 
corresponding to three types of three-stranded /3-sheets 
that differ in the length of the strands. Despite the signif- 
icant time spent in a variety of unfolded configurations, 
the extended chain does not appear explicitly in Fig.[Tl^a) 
because the RMSD clustering scheme splits this state into 
many sequentially connected groups of unstructured con- 
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FIG. 1: TRDGs for tube parameters es — 0.4 and ew = 0.05 
at temperature T* — 0.161. Configurations are grouped (a) 
by RMSD, and (b) by the order parameters {h,l,n). 



figurations, in contrast to the structurally well-defined 
folded states. 

As noted above, the construction of TRDGs relies on 
the availability of an ergodic dynamical trajectory that 
samples all the configurations of interest. However, equi- 
librium trajectories of computationally feasible lengths 
do not sample metastable states correctly on a complex 
landscape, where large free energy barriers must be sur- 
mounted to reach them. To mitigate these problems, 
biased sampling techniques can be employed, accepting 
that the resulting trajectories cannot be interpreted dy- 
namically. It is convenient in such simulations to project 
the configuration space on to a set of order parameters; 
we have chosen h, the number of hydrophobic contacts, 
and I and n, the number of local and non-local hydro- 
gen bonds, respectively. The triplet {h,l,n) can discern 
the common structural motifs adopted by the tube. To 
test the effect of the projection, we apply it to the dy- 
namic trajectory depicted in Fig. [IJa) to draw a com- 
panion TRDG (Fig. W[h)) where states are defined by 
the values of h, I and n, rather than by RMSD clustering. 
The most prominent feature in Fig. \V[h) is again the fun- 
nel of the a-helix, but now the three metastable /3-sheet 
states appear as one feature, since the order parameters 
do not distinguish between them. The projected graph 
clearly shows the metastable free energy minimum of the 
extended chain, which corresponds to a large volume of 
configuration space where {h, I, n) are all close to zero. 

Although some differences arise from the choice of the 
clustering method, the two TRDGs in Fig. [1] present a 
consistent picture of the FEL. For example, the total rel- 
ative partition function (obtained by summing the contri- 



butions of individual branches) of the three /3-sheet struc- 
tures in Fig.[lja) is 2179, while the corresponding funnel 
in Fig. [IJb) sums to a comparable 2565. The height of 
free energy barriers is also somewhat affected by the pro- 
jection, but the organization of the graph according to 
structural motifs is not. The {h,l,n) projection will be 
used in the biased simulations that follow. 

To sample the FEL extensively under conditions where 
larg e barriers exist, we implemented umbrella sampling 
[23| using a parabolic potential in / and n. The minimum 
of the umbrella potential was placed in turn over points 
on a grid in these two order parameters to generate a 
series of overlapping sampling windows. Each run was 
further enhanced by parallel tempering with four or six 
stages of temperature covering a range over which all 
states from native to fully unfolded are explored. The 
probability histograms P{h,l,n) from all the runs were 
combined using the multiple histogram technique [2^ . 
enabling the free energy F{h, I, n) = F,-ci — kT In P{h, I, n) 
to be determined for a wide range of the order parameters 
up to the arbitrary constant i^rcf • 

Since F{h,l,n) is a function of three variables, it can- 
not be depicted by a contour plot. We therefore adopted 
a graphical representation that shows the organization 
of basins (thermodynamically stable regions) and saddles 
(free energy barriers) of F(h, l,n). A basin is represented 
by its lowest point, a local minimum of F(h, I, n) defined 
by a triplet {h, I, n) for which a unit change in any indi- 
vidual order parameter or combination of order parame- 
ters leads to a higher free energy. A saddle point between 
two basins is the triplet of lowest free energy from which 
both basins can be reached by sequences of downhill step- 
wise changes in the order parameters. In this way, a net- 
work of nodes (basins) and edges (connecting saddles) is 
built up. A landscape topology graph (LTG) was then 
constructed analogously to a potential energy disconnec- 
tivity graph [2^, by placing the nodes on a vertical free 
energy axis, and connecting pairs of nodes at the free en- 
ergy of the highest saddle on the lowest contiguous path 
on F(h,l,n) that joins them. Since each basin is repre- 
sented only by its lowest point, the positions of the nodes 
and the vertical height of the branches in a LTG cannot 
be quantitatively identified with the total free energy of 
structures and barriers in the same way as in a TRDG. 
However, LTGs reveal particularly clearly the major fea- 
tures and organization of the FEL and will be seen in 
what follows to be consistent with TRDGs. 

Fig. [2] shows LTGs for the 24-residue tube model un- 
der a variety of conditions. The top-left panel is dom- 
inated by a deep free energy minimum of an a-helical 
state. The only prominent metastable structure under 
these conditions is the /3-sheet, with a /3-hairpin and the 
extended chain only weakly metastable. The central and 
right-hand panels of Fig. [2] represent situations in which 
the hydrophobicity is increased with all other parameters 
held fixed. The a-helix and /3-sheet are both destabilized 
with respect to various a/3 combinations and ultimately 
the /3-barrel. By this stage the a-helix, though present. 
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(c) Effect of temperature 



m 
K 
u 

>^ 3 

u 
c 

<D 2 




r 



FIG. 2: LTGs illustrating the effect of: (a) hydrophobicity 
(with es = 0.04, T* = 0.141 and ew = 0.04,-0.1,-0.2), 
(b) chain stiffness (with ew = —0.1, T* = 0.141 and es = 
0.2, 0.4, 0.6) and (c) temperature (with es — 0.4, ew = 0.05 
and T* = 0.181,0.16,0.141). The branches that correspond 
to the structural motifs depicted in panel (a) are color-coded 
accordingly. 



lies at such high free energy that it is unlikely to play a 
role in equilibrium folding. 

Fig.IHb) shows the effect of increasing stiffness at con- 
stant hydrophobicity. In all cases, the a-helix, /3-sheet 
and /3-helix are present. Altering the stiffness primarily 
adjusts the balance of stability between the a-helix and 
the various /3-strand motifs, while some a/3 combinations 
are metastable on each FEL. Overall, therefore, the a- 
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resembles the competition between stable, metastable 
and extended states shown in Fig. [3l thus linking our 
results to the misfolding process of proteins, which is of- 
ten accompanied by pathogenic aggregation 2^ 23| ■ 
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FIG. 3: Free energies (structure color code as for Fig. [2} as a 
function of reduced temperature for es — 0.2 and ew ~ 0.05. 
Inset; comparison with the native (red line), unfolded (black) 
and intermediate (blue) states of human lysozyme, where P 
is the experimentally determined population from Ref. [2^ . 



helix is favored by low stiffness and hydrophobicity, while 
competition arises with /^-strands at higher stiffness, and 
with more compact structures at higher hydrophobicity. 
Thus, metastable states are generally present but are 
small in number as a result of the energetic and steric 
restrictions on the configuration space. 

An increase in the number of residues in the polypep- 
tide chain produces only a modest increment in the com- 
plexity of the FEL. We found that chains of 24, 36 and 
48 residues at T* = 0.12, ew = 0.05 and es = 0.4 exhibit 
LTGs (not shown) with 5, 18 and 38 minima, respec- 
tively. Importantly, the number of prominent features 
(long branches) on the graphs remains small, increasing 
only from 2 to 4. 

If the tube parameters are held fixed while the temper- 
ature is varied, we see in Fig. ^c) that the balance be- 
tween the free energies of the a-helix and /3-sheet is sub- 
tly altered, but, as expected, the main effect of increasing 
the temperature is to stabilize the extended chain with 
respect to all compact structures. The LTG in the central 
panel is drawn at the folding temperature, where folded 
and unfolded structures compete equally. 

The temperature-dependent competition is further ex- 
amined in Fig. [3] in terms of — lnP(/i, /,n) (i.e., free en- 
ergy in units of the associated thermal energy), for the 
{h, I, n) of lowest free energy in each structural type. As 
already seen in Fig. [H^c), the native a- helix is gradu- 
ally destabilized with respect to the extended chain with 
increasing temperature. However, the stabilities of the 
/3-sheet and a/3 combinations change nonmonotonically. 
Starting from low temperatures, their thermal popula- 
tions first rise, but, like those of all compact structures, 
then fall because of the high entropy of the extended 
chain. Non-native states thus exhibit maximum stability 
close to the folding temperature while always remain- 
ing metastable. The experimentally-determined temper- 
ature dependence of the stability of an amyloidogenic in- 
termediate state of lysozyme (inset of Fig. |3|) 26] closely 



The conceptual picture that emerges from the present 
work is that the characteristic FEL of a polypetide chain 
is dominated by a particular family of related struc- 
tures, and that additionally contains a small number of 
metastable states. Our results also indicate that there are 
specific conditions under which these metastable states 
are likely to play a major role in determining the balance 
between folding and misfolding. 
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